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Abstract 

The significance of machine and natural vision is 
discussed together with the need for a general ap- 
proach to image acquisition and processing aimed 
at recognition. An exploratory scheme is proposed 
which encompasses the definition of spatial primi- 
tives, intrinsic image properties and sampling, two- 
dimensional edge detection at the smallest scale, 
construction of spatial primitives from edges, and iso- 
lation of contour information from textural informa- 
tion. Concepts drawn from or suggested by natural 
vision at both the perceptual and the physiological 
level are relied upon heavily to guide the development 
of the overall scheme. The scheme is intended to 
provide a larger context in which to place the emerg- 
ing technology of detector-array focal-plane proces- 
sors. The approach differs from many recent efforts 
at edge detection and image coding by emphasizing 
edge detection at the smallest scale as a foundation 
for multiscale symbolic processing while diminishing 
somewhat the importance of image convolutions with 
multiscale edge operators. Cursory treatments of in- 
formation theory illustrate that the direct applica- 
tion of this theory to structural information in images 
could not be realized. 

Background 

Much of the meaning we acquire from the world 
around us is derived from the sense of vision. Like- 
wise, for the bulk of the animal kingdom, vision is 
so central to basic activities that survival in the wild 
crucially depends on the visual functions of organ- 
isms. Another measure of the pivotal role of vision 
is the apparent dominance of our most intricate and 
vital organ, the brain, by vision processes (ref. 1, 
an excellent introduction to natural vision). At an 
entirely different level, words such as “vision” and 
“image” have taken on meanings far beyond our im- 
mediate visual perceptions and are broad metaphors 
for qualities such as wisdom, overriding impression, 
style, essence, remarkable intellectual ability, ex- 
traordinary artistic talent, and the ability to antici- 
pate the consequences of major decisions. In a like 
manner, terms such as “insight,” “picture,” and “far- 
sighted” occupy a territory in our language reserved 
for our highest or most sweeping judgments. The 
sophistication of the visual tasks we routinely per- 
form is effectively masked by the smoothness and 
ease with which they are carried out. In contrast 
the technology of machine vision is remarkably prim- 
itive in capability. A machine capable of one sophis- 
ticated visual task, such as reading or arbitrary ob- 
ject recognition, would be considered a technological 
marvel and yet remains beyond our grasp. Major 


advances in machine vision have far-reaching appli- 
cations, with a broad, pervasive impact on economic 
development, especially in automation. The current 
state of machine-vision technology is characterized as 
a highly developed general capability for acquisition, 
transmission, and storage of images through the use 
of electronic media and a few specialized capabilities 
for performing visual tasks primarily in extremely 
cooperative or restricted situations. 

Much progress has been made in advancing scien- 
tific knowledge about natural vision; however, a com- 
prehensive, conclusive understanding of vision pro- 
cesses related to anatomical structure, physiological 
functions, and visual perception has been as elusive 
as major advances in machine- vision technology. The 
pursuit of this knowledge must be regarded as a pre- 
mier scientific goal for our time because successful 
efforts would, in large measure, answer central ques- 
tions about the brain. 

The relationship between natural and machine 
vision is not a direct one and deserves further dis- 
cussion. Both fields share the same basic questions, 
which are (1) what information must be isolated from 
the optical image to form the basis for performing vi- 
sual tasks, and (2) how is this information extracted 
from the image? Another common question, not di- 
rectly associated with the optical image, is how is 
visual memory organized and the act of comparison 
or interpretation performed? Ultimately no complete 
separation of natural vision and machine vision is ei- 
ther possible or desirable. Since no general theory 
of visual information has been accepted, machine vi- 
sion must either directly or indirectly refer to natu- 
ral vision. Usually the machine-vision references to 
natural vision are at the level of visual perception 
and invoke assumptions or definitions. Assumptions 
such as “edges in images are important” are derived 
from compelling visual perceptions rather than from 
physical laws. More recently some machine- vision 
research (ref. 2) has been stimulated by the physi- 
ological level of natural vision, that is, the organi- 
zation of retinal and cortical receptive fields. The 
difficulty of using physiological data lies with its 
incompleteness and the fact that the machinery of 
natural vision is radically different from man-made 
cameras and computers. One difference, however, 
is highly encouraging — the natural-vision system is 
built from nerve cells with extremely slow signal re- 
sponses and transmissions compared with those of 
the components of electronic circuitry. More dis- 
couraging differences are the remarkable capacities 
of neural circuitry for intricate connections, submi- 
cron structures, and elegant control of elaborate, 
high-volume parallel processes with multiple levels 
of feedback and adaptability. A more embracing 


relationship between these two fields may arise from 
basic research in visual information. Major new dis- 
coveries in natural vision such as the cortical visual 
subsystem of cytochrome oxidase regions (refs. 3, 4, 
and 5) provide much food for thought for machine- 
vision researchers. This subsystem permeates the 
primary visual pathway of the cortex, is the most 
metabolically active part of the pathway, and is found 
only in primates. While this subsystem’s ability to 
process color has been established, its spatial func- 
tions are still undefined. In a different vein, the image 
processing tools of the machine-vision researcher can 
be very useful in testing scientific hypotheses con- 
cerning natural-vision processes. 

Introduction 

The current investigation attempts to define a 
general approach to spatial vision processes. In so 
doing it relies heavily on natural-vision concepts and 
addresses the two questions of what spatial informa- 
tion contained in the optical image is required for 
visual recognition of objects and how it is to be ex- 
tracted from the image. Following a general discus- 
sion, tentative answers are given together with the 
promising results of image processing experiments. 
A key assumption suggested by natural-vision retinal 
and cortical functions is that a general set of pro- 
cesses exists which can be applied to all images and 
which produces the information necessary for object 
recognition without the need to resort to specialized 
processes for special tasks. The processing scheme 
must be regarded as preliminary and requires much 
additional refinement and demonstration. Convinc- 
ing demonstration for quite diverse images is abso- 
lutely essential in view of the absence of basic phys- 
ical laws or theory for either derivation of processes 
or confirmation of results. This investigation stops 
short of addressing any questions concerning visual 
memory and the actual process of comparison to ef- 
fect recognition. It is assumed that the extraction of 
a concise set of spatial information from an image 
will naturally assist in the future development of a 
practical memory and recognition scheme. 

General Discussion 

If the broadest view of sensing and perception is 
considered, the crux of acquiring signals from the 
physical world and processing them is the transfor- 
mation of raw signals obtained from some physi- 
cal world measurement into the symbols of knowl- 
edge. In speech recognition (ref. 6) this overall 
process involves the transformation of measured 
acoustical signals into the phonemes of language. 
Victor Zue’s efforts are highly instructive in several 


regards. His work represents a bottom-up approach 
which relates the distinguishing spectral-temporal 
characteristics of the acoustical signal to a well- 
defined symbolic vocabulary. Of special interest is 
the crucial need to account for known distortions 
and the absence of any references to higher language 
structure (i.e., grammar or word context). The anal- 
ogy to vision, if appropriate, suggests that more con- 
sideration should be given to the characteristics of 
optical images, the smallest scale of processing (and 
necessarily the first stage), and the most immediate 
transformation possible into a symbolic domain. The 
problem with any direct analogy between speech and 
visual recognition is that the joint spectral and tem- 
poral signal characteristics of sound are replaced with 
the two-dimensional spatial and spatial frequency 
characteristics of images in vision. Further, no de- 
fined symbolic vocabulary exists for vision, and far 
more complexity and arbitrariness can be expected 
with visual recognition than with speech recognition. 
In short, the problem of visual recognition is far more 
poorly defined and certainly far more complex than 
speech recognition. 

Before we approach the problem of extracting 
visual information for recognition, the larger scope of 
the following visual tasks and information is explored 
for perspective. 

1. Image acquisition, transmission, and reconstruc- 
tion for human observers 

2. Image acquisition, image compression, trans- 
mission, and reconstruction for human observers 

3. Scene representation using abstract symbols for 
human observers 

4. Scene representation with abstract symbols for 
machine perception, recognition, and scene 
description 

The lowest level visual task already accomplished 
with current technology is acquiring, transmitting, 
reconstructing, and storing images with machines. 
All interpretation is by human observers. The next 
level, which is highly experimental, is image coding — 
acquiring, compressing , transmitting, storing, and 
reconstructing images with machines. The end prod- 
uct is still presented to the human observer in a 
more or less close approximation to its original form. 
Motivation for data compression proceeds from the 
well-publicized bulkiness of images as packages of 
data and the bandwidth limitations of transmission 
links. Optical fiber transmission will greatly expand 
bandwidth limitations, but with the possibility of 
bottlenecks developing in the electro-optical conver- 
sion processes. At the next higher level of sophisti- 
cation the visual task of scene representation or im- 
age rendering with abstract symbols is a dramatic 
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leap beyond image reconstruction. The interpreta- 
tion is still by human observer; however, the observer 
is no longer presented with a reconstruction of the 
original image, but rather a rendering perhaps much 
like that of an artist’s illustration is produced. The 
highest level of visual task is for the total scheme to 
be accomplished by machine, that is, representation 
with the use of symbols and recognition, description, 
and interpretation. In this case the human observer 
might receive only a printed report on the interpreta- 
tion of an image, that is, visual information cast into 
very brief language such as the following: “Image 542 
contains one object-side view of an automobile of un- 
known make. No license plate is visible. Do you wish 
to have any further details?” Needless to say this 
last case is a highly imaginary one in terms of cur- 
rent technological capabilities. There is considerable 
overlap in these general visual tasks, especially in the 
highest three levels. The results of recent image cod- 
ing research (ref. 7) suggest that all these tasks could 
be served by one general-purpose image acquisition 
and low-level processing front end, provided that the 
transformation from symbolic representation back to 
a reconstruction of an image is possible. In partic- 
ular, research on second-generation image coding is 
exploring image operators which are quite similar to 
natural-vision image operators and to those used for 
symbolic representation in machine- vis ion research. 
The critical point is whether image renderings drawn 
from symbolic representations can be constructed so 
they are as convincing as the original images without 
being highly accurate as a pixel-by-pixel reconstruc- 
tion of intensity values. In any case an immediate 
goal of exact image reconstruction must be dropped 
in order to address the larger questions, namely, what 
is the information needed from the image which can 
be the basis for performing visual tasks, and specifi- 
cally what information is needed for recognition? 

The Choice of Contour Information as a 
Centra] Focus 

Reference 8 expresses the logic which, on the 
whole, is similar to that used herein. The argument 
is that the information in line drawings is the pri- 
mary basis for recognition in visual perception be- 
cause observers can correctly interpret line draw- 
ings of images without resorting to color, shading, 
texture, stereometric cues, or monocularly viewed 
three-dimensional scene presentation. A distinction 
is made herein which should be emphasized. The line 
drawing is an explicit image itself, but it represents 
a more abstract form — the spatial layout of contour 
information in the image. Further, it must be empha- 
sized that contour information differs from the elab- 


orate line drawing which often contains considerable 
surface texture and shading effects. Contour infor- 
mation refers only to the significant zonal boundaries 
of an image and is represented by the most simpli- 
fied line drawings such as those in coloring books and 
graphic visual aids. Contour information is skeletal 
and not specifically concerned with surface charac- 
teristics other than their boundaries, including the 
boundary between differently textured surfaces. This 
investigation, while agreeing with the starting point 
of the Walters’ treatment and the emphasis on gen- 
eral visual processes (ref. 8), differs from her work 
in a fundamental way — namely, Walters concentrates 
mainly on processing line drawing images, while the 
concern herein is transforming natural gray-scale im- 
ages into contour information which is merely repre- 
sented in an explicit display as a line drawing. 

We seek to define contour information by analyz- 
ing the elements of simple line drawings in a very 
quantitative way. To do this a grid pattern is se- 
lected. For this investigation a square grid layout 
is chosen as most representative of image spaces en- 
countered with electronic images. For natural vision 
a hexagonal layout is undoubtedly more appropriate 
in view of the retinal structuring of photoreceptors 
and neural circuitry. Since the line drawing itself is 
an explicit representation of an abstraction (i.e., con- 
tour information), we assume that the widths of the 
lines can be made vanishingly small and are irrelevant 
to our general definition. The actual scale of the grid 
elements is also not particularly important. There 
is a smallest scale definable for any image based on 
the optical blur together with the image sampling 
scheme, but we could arbitrarily make a scene larger 
than this smallest scale in an image. For this investi- 
gation, the smallest scale information in an image is 
defined as the information directly obtainable from 
one discrete image sample and its immediate neigh- 
bors, without interpolation. This consideration is of 
critical importance in the processing of an actual im- 
age but is not relevant to a general definition of in- 
formation content other than to note the existence 
of a smallest scale limit. For a definition of the el- 
ements of contour information, arbitrariness of scale 
is a primary consideration. This does not mean that 
a coarse scale can provide as good a representation as 
a fine scale, but rather that the classes of elements 
themselves do not change with the scale. The fol- 
lowing system of five elements (fig. 1) meets this re- 
quirement: null (N), simple line (S), shaped line (Sh), 
complex line (C), and end of line (E). Obviously, as 
we change scales for a particular line drawing, dis- 
tributions of these elements change. As a general 
rule the shift from fine to coarse scales increases the 
relative proportion of complex line elements while it 
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diminishes the proportion of null elements. Note that 
the choice of a hexagonal rather than a square lay- 
out does not affect the definition of this system of 
line elements. 

The potential importance of this system is that 
an intrinsic logic of nearest neighbor groups can be 
established. For a nearest neighbor group of nine 
elements, 


many permutations of elements are strictly forbidden 
and many others are rarely to be encountered. An 
example of a forbidden permutation is 


N 

N 

N 

S 

S 

N 

N 

N 

N 


The center S should have been an E. The total num- 
ber of permutations of 5 classes taken 9 at a time is 
approximately 2.0 x 10 6 for a square grid. Interest- 
ingly, the hexagonal samples produce 5 classes taken 
7 at a time, or approximately 7.8 X 10 4 permuta- 
tions. This quantity of permutations is not nearly 
as discouraging as the quantity for the square grid, 
but it is hardly inspiring. Indeed the direct appli- 
cation of information theory must be delayed in the 
face of this seemingly intractable situation, especially 
since a common tool in information theory analyses 
is that equal probabilities of occurrence for all per- 
mutations can be assumed. Clearly this assumption 
of equal probabilities is not possible for this system 
of small groups of line elements, and the theoretical 
determination of probabilities of occurrence appears 
to be quite difficult, if not impossible. 

The construction of definite rules governing clas- 
sification of contour elements and allowable group- 
ings of nearest neighbors will be considered after the 
gray-scale image and its edge events have been exam- 
ined. The pathway from image acquisition through 
edge detection operations to contour extraction from 
spatial primitives is now considered. 

Two-Dimensional Edge Detection and 
Representation 

The questions which must be answered prior to 
performing edge detection on specific images are 
(1) what scale or scales should be chosen for edge 
operators, (2) what type of operator or operators 
should be chosen, (3) how is the operator to be 


constructed, given a sampled optical image, and 
(4) what technique should be employed for detecting 
and representing edges? Each of these questions is 
discussed in a logical sequence followed by the results 
of image processing experiments. 

Scale 

Although convolutions of multiple-scale edge or 
image-encoding operators with sampled optical im- 
ages have enjoyed considerable popularity (refs. 2 and 
7), the primacy of the smallest scale set by the spa- 
tial resolution limitations of the optical blur function 
and the image sampling scheme has not been suffi- 
ciently analyzed. The finest detail structure, along 
with much larger scale structure, is available only at 
the smallest scale, with the exception of certain im- 
age features or image conditions. These exceptions 
are (1) extended edges with a signal difference that 
is on a par with or below noise levels and (2) signifi- 
cantly blurred edges (i.e., certain shadows) or out-of- 
focus portions of a scene. These can and do occur in 
many images encountered, but these two phenomena 
rarely dominate the image unless there is a global 
defect in image quality. As a result the convolutions 
from larger scale edge operators are expected to be a 
necessary but often secondary engine in the machin- 
ery of vision. Edge detection at the smallest image 
scale is therefore explored, with a new emphasis, for 
capture of the finest detail and many larger scale fea- 
tures as well. 

Choice of Edge Operator 

A circular operator (fig. 2), which has been de- 
scriptively referred to as a Mexican hat function, is 
selected for uniform sensitivity to edges of arbitrary 
orientation, uniform suppression of two-dimensional 
low-spatial-frequency signals, its zero-crossing edge 
detection properties (ref. 2), and its ubiquitous 
occurrence in natural-vision retinal preprocessing 
(ref. 9). Various mathematical forms, which are 
essentially equivalent, have been used. Herein the 
mathematical form of a Gabor elementary signal is 
used since it places a precise form of the function 
in the framework of a full theory of communication 
(ref. 10). Further, the Gabor elementary signals have 
in various forms proven to be useful and accurate 
models for neurophysiological processing in natural 
vision (ref. 11) as well as in other sensory processes 
(ref. 10). A circular two-dimensional elementary sig- 
nal takes the form of 

G(r) = exp(— r 2 /2(T 2 ) cos(27r/r) (1) 

where r = (x 2 + y 2 ) 1 / 2 , o is the Gaussian space con- 
stant, and / is the modulation frequency. The terms 
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o and / are reciprocally related for any particular 
form of G(r ) and the product of of must be defined. 
If we invoke the constraint that the area integral over 
two-dimensional space must be zero to fully extin- 
guish zero-spatial-frequency signals, a unique value 
of of — 0.2080 is found (fig. 3). The cross sec- 
tion of this elementary signal (fig. 4) is essentially 
the same as other mathematical functions often used 
(i.e., difference of Gaussians and Laplacian of Gauss- 
ian). This form establishes the exact reciprocal rela- 
tionship between the space constant and the modu- 
lation frequency which must apply for any choice of 
spatial scale. 

Construction of Edge Operator From 
Weighted-Image Samples 

All scene radiance distributions have undergone 
two-dimensional convolution with the optical blur 
function and detector- array aperture response, and 
the sampling process has preset the amount of over- 
lap between adjacent samples. A full mathemati- 
cal treatment of the contributions of the optical blur 
function and the detector-array aperture functions 
to sampled images (ref. 12) contains an example of 
constructing a smallest scale difference of Gaussians 
(DOG) operator. A particular choice of weights for a 
3x3 group of image samples with a specific amount 
of blur and square detector-array apertures achieves 
an excellent smallest scale DOG. The image sam- 
ples processed with the weights are then equivalent 
to the sampling of a two-dimensional convolution of 
the DOG function with the scene radiance (intensity) 
field. Therefore, for one case the specific DOG (or, 
equivalently, the circular Gabor elementary signal) 
can clearly be constructed at the smallest scale in an 
image. 

An attempt is now made to extend this special 
case toward more generality. How difficult is this 
construction for most digital images, where opti- 
cal blur, detector-array or television vidicon point- 
spread function, and sampling lattice may all be 
unknown? Wide variations in detector element ge- 
ometries are common, so a group of nine Gaussian 
functions in a square grid is examined as a represen- 
tative case for imaging systems where the optical blur 
function dominates the detector aperture response 
in the overall two-dimensional character of the sys- 
tem response. The results (see fig. 5) are shown for 
a group wherein spacing is varied uniformly over a 
wide range relative to the optical blur space constant 
o $ of the individual optical Gaussians. These results 
are also equivalent to maintaining a constant spacing 
and varying oq. Only half of the cross section of the 
resulting two-dimensional function is shown, and it 


exhibits excellent shape quality compared with the 
circular Gabor elementary signal (or, equivalently, 
the DOG). The constraint of a zero- areal integral is 
maintained, being less than 0.1 percent of the value 
for the individual Gaussian. Circular symmetry is 
well preserved except for the value of a: = 2og. 
Therefore, a wide range of practical image conditions 
can be covered by the choice of one set of weights, 
even when optical blur and sampling are somewhat 
variable. 

The fairly general application of this set of weights 
has important implications for the design of general- 
purpose, front-end detector-array image-plane pro- 
cessor hardware. The weights used here were de- 
termined by dividing the circular Gabor function 
into nine squares and integrating and normalizing 
the center square to unity and the two pertinent 
adjacent squares relative to the center square. A 
quite different empirical approach (ref. 12) has deter- 
mined the same values for the weights. The hexago- 
nal array should present an easier problem since it 
possesses intrinsic circular symmetry, equal-valued 
perimeter weights, and densely packed circular de- 
tector apertures. 

Two-Dimensional Edge Detection and 
Representation at the Smallest Scale 

In a discrete sampled image in which the cir- 
cular Gabor function is constructed at the small- 
est scale, a discrete two-dimensional convolution of 
two two-dimensional functions results. This process 
is, in effect, a stepped integration of the circular 
Gabor function and the scene intensity or relative 
radiance distribution. Previous work (ref. 2) on de- 
tecting edges by zero crossings has emphasized larger 
scale operator sizes and more samples of edge con- 
volution signals than are available at the smallest 
scale in discrete samples of images. As we continu- 
ously convolve an edge signal with the circular Ga- 
bor function along any direction other than paral- 
lel to the edge, a characteristic curve occurs (fig. 6). 
For the smallest scale discrete image samples, only 
a few points on this curve are available and their 
exact placement is completely arbitrary, but the rel- 
ative spacing is determined by the image sampling 
lattice. The minimum number of samples available 
for each event is six in the local neighborhood of nine 
image samples (fig. 6(b)). It seems natural to ques- 
tion why six or more samples are needed to deter- 
mine what seems to be three edge locations. We 
could take an essentially one-dimensional approach 
and sift through a line of convolution values and place 
an edge location wherever a zero crossing occurs. 
If we do this in the example, we are immediately 
faced with a dilemma. The zero crossing often occurs 
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between two samples, so where do we put the edge 
location? Well, we could establish a convention and 
place all edge locations either to the right or to the 
left of the actual interpixel zero crossing. Consider 
another example — perhaps the simplest test of spa- 
tial resolution — two bars at the scale of the smallest 
image samples (fig. 7). If we apply this same treat- 
ment to the discrete convolution samples, the result is 
a perfectly meaningless dense and structureless mass 
of detected edge events! This example is particularly 
instructive since we detect all the edges correctly 
but fail to detect and represent something equally 
important — namely, where edges are not located! 

Now if we return to the samples of the characteris- 
tic convolution edge signal in two-dimensions (fig. 6), 
we can find two clues to resolving the dilemma: 
(1) the peaks and valleys in the curve bear infor- 
mation about null locations adjacent to edge loca- 
tions, and (2) a more two-dimensional approach to 
edge detection must be considered since each con- 
volution sample is surrounded by adjacent samples 
offering many more possible comparisons. We must 
determine which comparisons are to be made and the 
character of the representation, both of which are 
necessary to preserve smallest scale resolved spa- 
tial structure. The representation must clearly pro- 
vide for all edge locations and all adjacent null 
locations to retain unambiguous connectivity rela- 
tionships. Such a representation is shown in figure 7 
and requires a magnification factor of 2 over the orig- 
inal image sample space! We can now examine zero- 
crossing comparisons to find an approach which de- 
tects all edge locations and all adjacent null locations. 
This approach must consider that all possible edge 
and null locations include different locations within 
each image sample. The additional locations are not 
between pixels but rather reflect whether the edge 
falls more toward a specific sector of the periphery 
of the image sample as opposed to falling near the 
center of the sample. 

A scheme for two-dimensional zero-crossing detec- 
tion which is sufficient to produce this representation 
is illustrated in figure 8. This set of comparisons is 
made for each 3x3 group of image convolution sam- 
ples by stepping one sample vertically or horizon- 
tally for each subsequent set of comparisons. Each 
comparison must include a test for opposing polar- 
ity and some definition of “zero.” In the absence of 
other logic, the limits of values considered to be zero 
might best be set by the intrinsic noise level in an 
image which is either known in advance (where sen- 
sor performance is known) or estimated by examining 
areas of apparently uniform average value. 


Edge Detection Image Processing Experiments 

The approach and procedures for edge detection 
are applied to two images which are both quite dif- 
ferent in pattern content from each other and which 
contain considerable diversity of pattern information 
within each image. In short, the two images repre- 
sent wide-ranging arbitrariness of visual phenomena. 
The images were originally in color. The green band 
was selected as the gray-scale original image in each 
case. The toy-scene image (fig. 9(a)) contains letter- 
ing at about the scale of the individual pixel samples, 
clearly defined textural surfaces of various sorts in 
certain fabrics as well as mixed zones of contour and 
texture in the curtained backdrop, and object con- 
tours such as dolls and other toys, books, and illus- 
trated objects on book covers. In contrast, the man- 
drill (fig. 9(b)) is a flurry of fur texture, some blurred 
and some in focus, with contour associated with facial 
features. The edge representations (figs. 10 and 11) 
for the two images derived from smallest scale cir- 
cular Gabor convolution and two-dimensional zero- 
crossing edge detection are dramatic illustrations of 
the distance between the edge representation and an 
ultimate, concise contour representation. 

A more finely detailed look at the edge represen- 
tations shows a consistent trend of distortions. Many 
expected continuous edges are not detected perfectly 
as a result of insufficient signal change or insufficient 
sharpness. Likewise, the particular event associated 
with the intersection of two or more edges is often not 
detected perfectly. For this situation a gap is often 
produced in the zero-crossing maps, the gap appar- 
ently resulting from the higher contrast edge swamp- 
ing the convolution at the intersection point. This 
gap reflects the inability to detect a form of singular- 
ity in the image distributions. A further distortion 
is the staircase appearance of oblique edges most no- 
ticeable in the book outlines of the toy scene. This 
appears to be an intrinsic defect of the square-grid 
sampling scheme, which presents a shaped artifact at 
the local neighborhood scale. These two distortions 
will have to be considered when edge event groups 
are classified into spatial primitives. 

A further important point to note is the presence 
of many completely insignificant edge events (isolated 
small groups or individuals) or edge events of sec- 
ondary interest (textural clouds, stipples, or hatch- 
ings). In summary this smallest scale edge detection 
process produces certain defects in contour informa- 
tion and a profusion of edge events not related to 
contour information; however, the edge representa- 
tion is quite interpretable in spite of these defects. 
One defect which is not present is any significant 
spatial distortion of edge locations. This lack of 
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distortion, together with the capture of fine detail, 
is the primary reason for using the smallest scale op- 
erator possible. These properties of the edge repre- 
sentation and the line elements defined from abstract 
line drawings can now guide the definition of gener- 
alized spatial primitives constructed from the edge 
representation as precursors to contour information. 

Definition of Spatial Primitives Based Upon 
the Edge Patterns 

The line elements derived from the highly ab- 
stract line drawings (fig. 1) are now referred to the 
edge events of gray-scale images. Further, the known 
distortions or limitations of the edge detection pro- 
cess must be treated. First, consider the abstract 
line drawing as an optical image with each line being 
about one pixel wide in the sampled images. The 
edge representation of this image and that of an 
equivalent gray-scale image is examined. For this 
case there is a major difference between the line el- 
ements of the abstract line drawing and of the edge 
representation — that is, each line is really two edges 
when resolved in the optical image. Spatial primi- 
tives must be revised based upon edges, not lines. 
The line elements are revised as follows and now re- 
fer equally well to edge patterns in both gray-scale 
images and line drawing images: 


Element 

Symbol 

Null 

N 

Simple edge 

S 

Shaped edge 

Sh 

Complex edge 

c 

End of edge 

E 


Note that the original end-of-line element is now a 
special case of the shaped-edge element Sh. The 
end-of-edge element E is a dubious event since most 
images are of scenes with extended objects as their 
subject matter, and point phenomena (if resolved) 
would be a small circle or square with a null center. 
Therefore, E events are error situations in which 
some edge phenomena are not completely detected 
or resolved because of excessively crowded adjacent 
edge events, a shift to subthreshold contrast, or the 
presence of high levels of noise. As already noted for 
C events, actual image convolutions often produce a 
gap in the locus of zero crossings (hence, an E event) 
at the point of intersection. This does not mean that 
a C event could not occur in actual edge patterns, but 
rather that not all edge intersections produce them. 

The detected 3x3 edge patterns of the two test 
images are now classified into the general spatial 


primitives. The results (figs. 12 and 13) illustrate 
the ability of this set of primitives to represent an 
extensive array of contour and texture phenomena 
even in regionally mixed groups. The color code is 
green for simple edges S, blue for shaped edges Sh, 
red for ends of edges E, yellow for complex edges C, 
and gray for null N. Shape artifacts of the square- 
grid matrix are not classified as shape primitives and 
the edge intersections which result in gaps (E events) 
are left as such. Blowups of portions of both images 
(fig. 14) illustrate the local structural consistency of 
the classification scheme. The spatial primitive dis- 
tributions of each quadrant of both images (table 1) 
quantify an overall expected trend — the highest rel- 
ative frequency of occurrence for nonnull events is 
for simple edges, which has a rather unexpected sta- 
bility of 69 to 79 percent. The two “error” classes, 
E and C, vie with each other for the next highest 
relative frequency of occurrence, though this is not 
likely to be generally true for high-quality scenes with 
little or no textural phenomena. From the computa- 
tional viewpoint, absolute frequencies of occurrence 
of nonnull spatial primitives are encouragingly low, 
ranging from 12 percent (toy scene) to 28 percent 
(mandrill) for the magnified representational space 
(1024 x 1024). Furthermore, if textural events are 
removed, we can expect these absolute values to drop 
precipitously for the mandrill and somewhat for the 
toy scene. 

The application of edge detection and spatial 
primitive classification to images supplies the frame- 
work for a more detailed consideration of informa- 
tion theory. If we take only the nonnull center 3x3 
edge patterns which are detected and their associ- 
ated spatial primitives, a reasonably comprehensive 
analysis of permutations is now possible (fig. 15). Al- 
though absolute frequencies of occurrence cannot be 
assigned for the contour-texture representations, the 
symbolic vocabulary of possible 3x3 configurations 
can be narrowed dramatically with reasonable confi- 
dence. We can see a potentially more radical diminu- 
tion of allowed choices for the spatial primitives than 
for the detected edge permutations. This diminu- 
tion is due to the number of expected configurations 
without reference to their exact frequencies of occur- 
rence. Determinations of frequencies of occurrence 
may not be possible in a general sense because they 
may vary widely from one image to another. This 
variability is shown in the contour-texture represen- 
tations (figs. 12 and 13). A complete information 
theory analysis will only be possible if pure contour 
representations are achieved and display stable val- 
ues for frequency of occurrence. This is most unlikely 
since pure contour itself can vary widely from sparse 
to dense patterns both regionally and globally. On 
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the other hand, it may be possible to bracket fre- 
quencies of occurrence in some manner and thereby 
quantify information. This bracketing would require 
a more extensive analysis of diverse images and the 
distributions within their spatial primitive represen- 
tations than is now being attempted. This endeavor 
is dropped in order to concentrate on methods for 
contour- texture discrimination that use the spatial 
primitive representation. 

Contour and Texture Discrimination — An 
Issue for Multiscale Interpretation of Group 
Properties of Spatial Primitives 

A completely satisfactory scientific definition of 
contour and texture does not exist; however, it is 
possible to establish the overall character of each and 
how they both can be formed from the same set of 
spatial primitives. Contour is most often composed 
of continuous connected simple S and shaped Sh 
primitives with rarely occurring complex C or end- 
of-edge E primitives. Contour information is often 
rather sparse, but it can be dense for cases such as 
printed text, where the line width of lettering is at 
or near the image sample size. Dense contour in- 
formation such as text appears to have a complete 
absence of C primitives, a paucity of E primitives, 
but a significant number of Sh primitives for espe- 
cially high densities. Texture likewise can be dense 
or sparse but is expected to possess a high percent- 
age of C primitives for surfaces such as woven fabric 
or fur or a high percentage of E primitives for granu- 
lar, ripped surfaces such as painted plaster or cinder 
block. Regular striped surfaces, grids, and gratings 
whose scale is small (near or below the image sam- 
ple size) should also be considered as texture. Dense 
contour such as printed text becomes textural if it is 
made small enough to be poorly resolved. Likewise, 
grids and stripes become contour if made sufficiently 
large. We can cite examples of smallest scale 3x3 
groups of spatial primitives which can be used to con- 
struct either contour or texture patterns equally well. 
Therefore, it is quite clear that scales larger than 
3x3 must be examined to perform contour-texture 
discrimination. 

Before engaging in an exercise in contour- texture 
discrimination, we can convey a more general feeling 
for the visual character of each. Obviously most tex- 
ture examples cannot be complete or be expected to 
capture all types, but an attempt to compile typical 
or representative forms seems necessary. (See fig. 16.) 
Again, the distinguishing spatial primitive arrange- 
ments suggested are high frequencies of occurrence 
of E or C, or are just highly dense formations of S 
without much Sh occurrence. The visual perception 


of noise is textural, so noise is included in the texture 
classes. 

A highly preliminary multiscale exercise in 
contour-texture discrimination is presented to illus- 
trate the potential of this purely symbolic approach. 
(See figs. 17 and 18.) No attempt has been made 
to detect and represent the contours associated with 
purely texture boundaries. Therefore, these bound- 
aries simply disappear. The methods used involve 
four scales — 3 x 3, 6 x 6, 12 x 12, and 24 x 24 square 
windows — in the spatial primitive representations, 
and obviously further development is needed. These 
methods are based upon setting limits on the max- 
imum total number of spatial primitives allowed for 
each scale and on the maximum number of E events 
allowed for each scale. However, most texture is dis- 
solved while most contour is retained, so the promise 
of multiscale symbolic processing for contour-texture 
discrimination based on spatial primitive distribu- 
tions is apparent. 

The major limitation of the methods developed 
thus far is their inability to detect and represent all 
the contours that exist perceptually between differ- 
ently textured zones and between a texture zone and 
adjacent null zones in images. This limitation is a 
subject for further investigation and it necessarily in- 
volves the question of large-scale windows. This in- 
vestigation should relate any methods developed to 
the texton theory of Julesz (ref. 13), which has been 
used so successfully to treat the perceived differences 
between adjacent texture surfaces. The Julesz theory 
treats texture discrimination as being based on local 
densities and distributions of particular features in 
the image. In this sense textons are similar to the 
spatial primitives used herein; however, the orienta- 
tion of edges figures prominently in the texton the- 
ory and is completely absent from this set of spatial 
primitives. Further, the Julesz textons refer to local 
features and structures of larger scale than the spa- 
tial primitives, which were developed as the smallest 
distinguishable units of structure. Both the textons 
and the spatial primitives used herein share an em- 
phasis on spatial discontinuities (i.e., ends of lines 
and intersections for the textons and C and E events 
for the spatial primitives). 

Conclusions 

A general vision scheme encompassing spatial 
primitive definition, image acquisition, small-scale 
two-dimensional edge detection and representation, 
spatial primitive classification, and contour-texture 
discrimination was presented and illustrated with ex- 
perimental image processing results. The scheme 
is intended as a preliminary set of methods for ex- 
tracting from optical images the significant structural 
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information that is necessary for subsequent machine 
object recognition and scene interpretation. Natural- 
vision concepts and visual perception were employed 
in the development of the scheme. A major limita- 
tion of the contour-texture discrimination approach 
is that no attempt is made to represent the contours 
which are perceived at texture boundaries. The prin- 
cipal results of this investigation are the following: 

1. Both edges and adjacent null elements must 
be determined and require a representation space 
which is magnified by a factor of 2 over the original 
image space. Otherwise, the finest resolved structure 
in an image is lost. 

2. General spatial primitives are defined which 
capture the smallest units of spatial structure in the 
optical image subject to limitations of the edge de- 
tection process in handling edge intersection discon- 
tinuities, artifacts due to a square-grid discrete image 
space, and ambiguities created by unresolved struc- 
ture and noise. 

3. The general spatial primitives can and do 
represent both contour and texture phenomena, and 
they improve the ability to discriminate one from 
the other through the use of multiscale symbolic 
processing. 

NASA Langley Research Center 
Hampton, VA 23665-5225 
August 30, 1988 
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Table 1. Distributions of Detected Spatial Primitives 




Number of events (relative occurrence 

, percent 0 ) for — 


Event 

Quadrant 1 

Quadrant 2 

Quadrant 3 

Quadrant 4 

Total image 

Toy scene 

S 

27986 (76) 

25691 (71) 

16628 (79) 

22 169 (75) 

92474 (75) 

Sh 

1491 (4) 

1 458 (4) 

645 (3) 

1211 (4) 

4 805 (4) 

E 

4 458 (12) 

7 851 (22) 

2945 (14) 

4165 (14) 

19419 (16) 

C 

2 664 (7) 

1 360 (4) 

743 (4) 

2 039 (7) 

6 806 (6) 

All 

36 599 (99) 

36 360 (101) 

20961 (100) 

29584 (100) 

123 504 (101) 

Mandrill 

S 

60 339 (74) 

57 091 (74) 

46 421 (72) 

48 006 (69) 

211857 (72) 

Sh 

5 010 (6) 

4 740 (6) 

3625 (6) 

4 663 (7) 

18 038 (6) 

E 

5 664 (7) 

5 774 (8) 

9333 (14) 

8 450 (12) 

29221 (10) 

C 

10442 (13) 

9053 (12) 

5 011 (8) 

8 657 (12) 

33163 (11) 

All 

81455 (100) 

76658 (100) 

64 390 (100) 

69 776 (100) 

292 279 (99) 


a Accuracy of ±1 percent. 
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Figure 3. Area integral of G(r) as function of of . 
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Figure 4. Cross section of two-dimensional circular 
value of zero. 


X 


Magnitude 
of group 
response 
for nine 
weighted 
Gaussians 



Radial distance, r 


Figure 5. Deformation of group response of nine weighted Gaussian functions for variable spatial overlap. 
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Figure 7. Hypothetical comparison of one- and two-dimensional edge detection and representation processes 
for two-bar target. 
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Figure 8. Smallest scale edge detection and representation process for 3 x 3 group of convolution samples. 
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Figure 9. Test images. 







(c) Quadrant 3. (d) Quadrant 4. 

Figure 10. Smallest scale detected edge representation for toy scene. 
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Figure 12. Smallest scale spatial primitive representation for quadrant 2 of toy scene. 




Figure 13. Smallest scale spatial primitive representation for quadrant 2 of mandrill. 
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(b) Mandrill. 


Figure 14. Details of smallest scale spatial primitive representation. 
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Figure 15. Exposition of detected 3x3 edge patterns and associated spatial primitive permutations (nonnull 
center elements). 




Figure 16. Some classes of texture. 
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(a) After 3x3 scale processing. (b) After 6x6 scale processing. 




(c) After 12 x 12 scale processing. (d) After 24 x 24 scale processing. 

Figure 17. Sequence of multiscale contour operations for quadrant 4 of toy scene. 
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(a) After 3x3 scale processing. (b) After 6x6 scale processing. 


(c) After 12 X 12 scale processing. (d) After 24 X 24 scale processing. 

Figure 18. Sequence of multiscale contour operations for quadrant 2 of mandrill. 
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